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Abstract 

Copy-number variants (CNVs) reshape gene structure, modulate gene expression, and contribute to significant phenotypic 
variation. Previous studies have revealed CNV patterns in natural populations of Drosophila melanogaster and suggested that 
selection and mutational bias shape genomic patterns of CNV. Although previous CNV studies focused on heterogeneous 
strains, here, we established a number of second-chromosome substitution lines to uncover CNV characteristics when 
homozygous. The percentage of genes harboring CNVs is higher than found in previous studies. More CNVs are detected in 
homozygous than heterozygous substitution strains, suggesting the comparative genomic hybridization arrays un- 
derestimate CNV owing to heterozygous masking. We incorporated previous gene expression data collected from some of 
the same substitution lines to investigate relationships between CNV gene dosage and expression. Most genes present in 
CNVs show no evidence of increased or diminished transcription, and the fraction of such dosage-insensitive CNVs is greater 
in heterozygotes. More than 70% of the dosage-sensitive CNVs are recessive with undetectable effects on transcription in 
heterozygotes. A deficiency of singletons in recessive dosage-sensitive CNVs supports the hypothesis that most CNVs are 
subject to negative selection. On the other hand, relaxed purifying selection might account for the higher number of 
protein-protein interactions in dosage-insensitive CNVs than in dosage-sensitive CNVs. Dosage-sensitive CNVs that are 
upregulated and downregulated coincide with copy-number increases and decreases. Our results help clarify the relation 
between CNV dosage and gene expression in the D. melanogaster genome. 
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Introduction 

Recent analyses of structural genetic variation have high- 
lighted the presence of extensive naturally occurring 
copy-number variants (CNVs) in organisms as diverse as hu- 
mans, fruit flies, yeast, and plants (Sebat et al. 2004; Snijders 
et al. 2005; Perry et al. 2006; Redon et al. 2006; Dopman 
and Hartl 2007; Emerson et al. 2008; McCarroll et al. 2008; 
Carreto et al. 2008; DeBolt 2010). About 1 0% of the human 
genome harbor CNVs (Redon et al. 2006), with an estimated 
average of 1 2 CNVs per individual relative to a reference se- 
quence (Feuk et al. 2006). In humans, a number of studies 
have indicated links between CNV and disease phenotypes 
(McCarroll and Altshuler 2007; Zhang et al. 2009; WTCCC 
2010), whereas only a handful of studies have been con- 
ducted in natural populations of Drosophila melanogaster. 
Similar to estimates from the human genome, about 



5-8% of the D. melanogaster genome were estimated to 
contain CNVs (Dopman and Hartl 2007; Emerson et al. 
2008; Cridland and Thornton 2010). The nonrandom distri- 
bution CNV patterns in D. melanogaster suggest that selec- 
tion and mutational biases are primary forces that shape 
structural variation (Dopman and Hartl 2007; Emerson 
et al. 2008). Furthermore, the occurrence of CNVs was 
found to be negatively associated with the abundance of 
protein-protein interactions (Dopman and Hartl 2007). To 
date, all the reported CNVanalyses in D. melanogaster were 
based on heterogeneous isofemale strains from natural 
populations. Many CNVs are presumably heterozygous in 
these lines, which is problematic because the incidence of 
CNVs may be underestimated. Hence, a better resolution 
of CNVs may be expected from studies of homozygous 
genotypes. 
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In the human genome, about half of the CNVs detected 
overlap with protein-coding regions (Sebat et al. 2004) 
changing gene structure and dosage. Therefore, CNV loci 
encompassing genes may potentially affect gene expres- 
sion, which can subsequently shape ecologically, evolution- 
ary, and medically relevant phenotypes (Stranger et al. 
2007; Henrichsen et al. 2009; Schuster-Bockler et al. 201 0). 

Compensatory mechanisms are commonly invoked in at- 
tempts to understand the functional and evolutionary con- 
sequences of ploidy and sex determination (Birchler et al. 
2007; Vicoso and Bachtrog 2009), but dosage must also 
be important for CNV loci encompassing individual genes. 
Indeed, disruption in the stoichiometric balance of proteins 
belonging to molecular complexes may affect gene expres- 
sion (Birchler et al. 2005). The effects of aneuploidy result 
from a change in the relative dosage balance among various 
regulatory components that arise due to unbalanced alter- 
ations in gene copy number (Birchler et al. 2001, 2005). 
Dosage sensitivity is an essential evolutionary mechanism 
that influences gene dispensability. Although the underlying 
causes of dosage sensitivity remain poorly understood, pre- 
vious reports suggested a complex relationship between 
haploinsufficiency and duplication sensitivity (Veitia 2002). 
Complexity may be explained from the balance hypothesis 
(Birchler et al. 2007) in which multiprotein complexes need 
to maintain the stoichiometry of their subunits to perform 
biological functions (Papp et al. 2003). As CNVs harboring 
duplications and deletions potentially create gene dosage 
effects, understanding the balance between CNV gene dos- 
age and expression should shed light on the evolution of 
CNVs and how CNVs affect gene regulation. 

It was previously reported that more than 70% of genes 
in D. melanogaster that are differentially expressed in con- 
trasts between homozygous genotypes lack expression dif- 
ferences when in the heterozygous state (Lemos et al. 
2008). This result suggested that recessive alleles with reg- 
ulatory consequences might be abundant in Drosophila 
(Lemos et al. 2008). Because gene heterozygosity is preva- 
lent in natural populations (Singh and Rhomberg 1 987), the 
expression of genes encompassed in CNV could also be 
largely masked in heterozygotes. 

Here, we addressed the relevance of CNV in homozygous 
and heterozygous genotypes to reveal dosage effects of 
CNVs. We generated six second-chromosome substitution 
homozygous lines and two heterozygous lines to investigate 
CNV patterns. We also utilized gene expression data con- 
ducted with some of the same substitution lines to infer 
the association between CNVs and their gene expression. 
We found that most CNVs appear to have low levels of dos- 
age sensitivity, and they are often recessive in heterozygous 
state. Nevertheless, increases and decreases in copy number 
coincide with up- and downregulation in a number of cases. 
Overall, our work highlights complex relationships between 
gene dosage and expression. 



Materials and Methods 

Fly Stocks 

Some of the second-chromosome substitution strains (PS1 , 
PS2, PS3, CS) in this study were previously described by 
Lemos et al. (2008). Strains PS4 and PS5 were established 
using identical methodology (supplementary fig. S1 in 
Lemos et al. 2008). Heterozygous strains PS2/CS and PS 5/ 
CS are obtained in the Ft generation of homozygous sec- 
ond-chromosome substitution strains and contain two dif- 
ferent second chromosomes in an otherwise identical 
genetic background. A total of eight strains were assayed. 

DNA Isolation and Digestion 

Genomic DNA was isolated from either 40 adult females or 
60 males, using QIAGEN DNeasy blood and tissue kit (Cat. 
No. 695004). Genomic DNA was then digested with 1 .5 ul 
Msp\ enzyme to randomly digest the genome into moder- 
ately sized fragments (average size ~ 3.5 kb, Barker 
et al. 1984). Restriction digests followed the manufacture's 
recommendations (New England BioLabs, 20,000 U/ml) of 
37 °C for 1 h; an equal amount of enzyme was added 
for an additional 1 h to assure complete digestion. DNA 
was further cleaned by Phase Lock Gel (Eppendorf) and phe- 
nol purification. Five micrograms DNA was used for each 
sample resulting in 10 ug DNA in each microarray reaction. 

Microarray Platform 

Array comparative genomic hybridizations (aCGH) were per- 
formed with an 18,000-feature DNA microarray. Labeling 
and hybridization were conducted with the 3DNA Array 
900 MPX kit (Genisphere), with a Cy5-Cy3 two-channel 
dye swap for each reaction. All the DNA copy-number 
increases and decreases in the other seven sampled strains 
were estimated relative to PS1 . Besides the dye swap, every 
reaction had at least two replicates (experimental design 
shown in supplementary fig. S1, Supplementary Material 
online). Upon hybridization, microarray slides were scanned 
in an Axon 4000B scanner (Axon Instruments). Gene expres- 
sion microarrays, experimental designs, and previous results 
used in this study were obtained from Lemos et al. (2008). 

Microarray Analyses 

Scanned microarray slides were first analyzed with GenePix 
Pro 6.0 software (Axon Instruments). Fluorescence Cy5 and 
Cy3 intensities were then normalized by the Limma library of 
software R (Version 2.10.1). Two different methods were 
used to ascertain copy-number increases and decreases: 
threshold analysis and Bayesian Analysis of Gene Expression 
Levels (BAGEL). In threshold analysis, probes were sug- 
gested as indicating an occurrence of a CNV event if the 
standard error of the log-intensity ratio was beyond an in- 
tensity-ratio threshold. The threshold ratio was established 
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from self-self-hybridizations in the reference strain, by con- 
trolling the false-positives to <1% (for more details, see 
Dopman and Hartl 2007). BAGEL analysis uses Bayesian al- 
gorithm to compute the probe signal ratios between sam- 
ples and the reference strain, with P values indicating the 
significance (for more details, see Townsend and Hartl 
2002; Lemos et al. 2008). Array probes located in transpo- 
sons or containing repetitive sequences were removed from 
the analyses. The two methods were in good agreement and 
the patterns herein described are robust to the choice of 
method for ascertaining gains and losses. Only the threshold 
results are shown in great detail in the Results. 

Analyses Schemes for Associations between CNVs 
and Gene Expression 

To determine dosage effects of CNVs in homozygotes, "hor- 
izontal" comparisons of gene expression levels between ho- 
mozygous PS2 and PS1 were conducted as illustrated in 
figure 2. To determine dosage effects of CNVs in heterozy- 
gotes, horizontal comparisons of gene expression levels be- 
tween heterozygous PS1/PS3 and PS2/PS3 were conducted. 
To determine recessive CNVs (heterozygous masking effect), 
the results collected from homozygous PS2 and PS1 compar- 
isons were combined with those for PS1/PS3 and PS2/PS3 
heterozygotes to infer if the same CNV genes in PS2 are 
recessive to PS3. To determine if upregulated or downregu- 
lated CNVs are matched with copy-number increases or de- 
creases, "vertical" comparisons between PS1/PS1 and PS1/ 
PS3 were conducted as also shown in figure 2, where in this 
comparison, PS3 harbors CNVs relative to PS1 instead of no 
differences from PS1 in the horizontal comparisons. 

Protein Interactions for CNV Genes 

The interaction data set from BIOGRID (Stark et al. 2006; 
http://thebiogrid.org/) was used to detect protein-protein 
interactions for dosage-sensitive, dosage-insensitive, reces- 
sive and nonrecessive CNV genes in different context. Only 
genes with >1 interactions were analyzed. 

Results 

Populations of D. melanogaster can be polymorphic for as 
many as 43% of their gene loci, and an average individual 
typically shows a level of heterozygosity on the order of 
10% (Singh and Rhomberg 1987). Several recent studies 
have accessed CNVs with microarray and sequencing tech- 
nologies using genetically heterogeneous isofemale strains 
(Dopman and Hartl 2007; Emerson et al. 2008; Cridland and 
Thornton 201 0). However, the contribution of copy-number 
heterozygosity to estimates of copy-number variation is dif- 
ficult to evaluate. Therefore, we investigated CNVs in com- 
pletely homozygous chromosome substitution lines, which 
differ exclusively in the origin of the second chromosome 



but are otherwise genetically identical. All the second chro- 
mosomes were derived from a single Pennsylvania popula- 
tion (except line CS), whereas other chromosomes were 
originated from the marker lines used to construct these 
substitution lines (for details, see supplementary fig. S1 in 
Lemos et al. 2008). These second-chromosome substitution 
strains offer two major advantages. First, false positive error 
rates can be experimentally ascertained because no CNVs 
are expected to be found from probes located in the third, 
fourth, and X chromosomes. Second, chromosomes are ho- 
mozygous within each strain, and so issues of detection as- 
sociated with identifying CNVs in heterozygotes can be 
avoided. In this study, we utilized six homozygous and 
two heterozygous second-chromosome substitution lines, 
originally established by Lemos et al. (2008), to reveal the 
CNV patterns. 

Variation in gene expression levels contributes to dra- 
matic phenotypic differences between individuals and pop- 
ulations. Gene copy-number differences among individuals 
and populations can provide a source of gene expression 
variation (Stranger et al. 2007), although evidence suggests 
complex relationships between gene copy number and ex- 
pression (Birchler et al. 2005, 2007). Changes in dosage of 
individual chromosome or chromosomal segments have 
more extreme global effects on gene expression than ob- 
served in ploidy series (Birchler et al. 2007). The balance be- 
tween CNV gene dosage and expression levels can address 
how significant gene copy variation as well as gene struc- 
tural changes induced by CNVs may affect gene regulation. 
Here, we addressed the extent of copy-number variation 
across chromosomes sampled from a single population (ex- 
cept for strain CS) and also combined this CNV data with 
previously reported gene expression data (Lemos et al. 
2008) to investigate the balance between gene dosage 
and expression. 

Validation of Methods Used in the Detection of 
CNVs 

As females have two copies of X-linked genes and males 
only have one copy, male-female aCGH result in an excess 
of female signals for X-linked genes that can be used to cal- 
ibrate the threshold values and detection methods. Indeed, 
lower signal ratios between male and female X-linked genes 
are reflected in supplementary figure S2A (Supplementary 
Material online) (based on threshold analysis; data on 
a log scale). In addition, only the second chromosomes in 
the substitution lines may be expected to contain gene 
copy-number variation, as all other chromosomes are in 
principle invariant across all strains. Indeed, as shown in sup- 
plementary figure S26 (Supplementary Material online), in 
one of the substitution strains (PS2) relative to the reference 
strain PS1, the second chromosome contains virtually all of 
the CNVs detected by microarray hybridizations. Other 
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strains in this study all showed similar patterns (data not 
shown). With regard to BAGEL analyses, supplementary 
figure S3 (Supplementary Material online) demonstrates 
the distributions of probabilities of CNV occurrence in the 
sample strain PS2 compared with the reference strain 
PS1 . As expected, the distribution of P values for probes lo- 
cated in the second chromosome is notably skewed to low 
(P < 0.05) or high (P > 0.95), indicating copy-number 
decrease and increase in PS2, respectively. In contrast, and 
in agreement with the expectation if the third, fourth, 
and X chromosomes are invariant, the distribution of P val- 
ues is uniform for pooled data from all other chromosomes. 
These observations suggest a substantial level of variation of 
gene copy numbers on the second chromosome that can be 
detected with two distinct methods. In the following, we on- 
ly show results and analyses based on the threshold method. 

CNVs in Homozygous Second-Chromosome Sub- 
stitution Strains 

The number and fractions of CNV increases and decreases 
in five homozygous and two heterozygous second- 
chromosome substitution strains, relative to the reference 
strain, are plotted in figure 1 . Because sample arrays and 
their replicates were not all prepared at the same time, 
batch variation may result in different detection rates of 
aCGH. The number of CNV increases between a sample 
strain and PS1 ranged from —100 to 350 and the number 
of CNV decreases ranged from —100 to 400 depending 
on the batch and strain. However, the fraction of CNVs that 
increased or decreased in number was found to be balanced 
in each strain (fig. 1/4). In some of the strains, such as PS2, 
PS3, CS, and Heterozygous PS5/CS, there were slightly more 
copy-number decreases (on average 5%) than increases. In 
PS5, more increases (5%) were observed. The fractions 
were within 0.5% variation in PS4 and PS2/CS. 

The percentages of probes containing CNVs among all 
the detected genes on the second chromosomes are shown 
on top of the bars in figure 14. For the five homozygous 
strains, the average CNV fraction is about 9.5% (range from 
7.7% to 14.4%), a finding that is somewhat higher than 
previous reports of 5-8% in D. melanogaster (Dopman 
and Hartl 2007; Emerson et al. 2008; Cridland and Thornton 
201 0). The levels of variation detected in heterozygous PS2/ 
CS (5.7%) and PS5/CS (6.1 %) were lower than in their ho- 
mozygotes. Two factors can account for these lower percen- 
tages in heterozygotes. First, some duplications and 
deletions may complement each other in heterozygotes re- 
sulting in no variation compared with the reference strain. 
Second, and most likely in view of the observation that most 
CNVs are singletons (discussed in the following paragraphs), 
aCGH arrays may be less sensitive to detecting copy-number 
variation in heterozygotes. This is because whenever a CNV 
is unique to only one homozygous strain, the magnitude of 
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Fig. 1. — CNV composition in homozygous second chromosomes. 
(A) Summary of CNV copy-number increases and decreases relative to 
a reference strain PS1 in the seven second-chromosome substitution 
lines, homozygous PS2, PS3, CS, PS4, and PS5; heterozygous PS2/CS 
and PS5/CS. Bars represent the fractions of CNV increases (gray) and 
decreases (black). The numbers on top of the bars show the number of 
increases and decreases detected in each strain, respectively. The 
percentages on top of the numbers indicate the percentage of genes 
(probes) containing CNVs among all the detected genes (probes) from 
the second chromosomes, (fi) A pie chart demonstrating the fraction of 
singleton and nonsingleton CNVs derived from the five homozygous 
lines (CNV allele frequencies) relative to PS1 . The black area shows the 
fraction of singletons that contain decreased copies in one strain relative 
to the other four strains and the reference strain PS1. The gray area 
shows the fraction of singletons that contain increased copies in only 
one strain. The white area shows the fraction of nonsingletons that 
appear in more than one strain as either increase or decrease. 

fold-change between the homozygous reference strain and 
the heterozyogous is less extreme. 

CNVs can be either clustered at certain regions or dis- 
persed across a whole chromosome. To distinguish CNVs 
from larger scale segmental duplications, we investigated 
CNV clustering by checking the fraction of CNVs that can 
be found in contiguous sets of more than three CNVs along 
the second chromosome. Minor clustering was found (sup- 
plementary table S1 , Supplementary Material online). None 
of those clustered area involved more than six genes. In- 
stead, CNVs were spread across the whole second chromo- 
somes. 

CNV Allele Frequency 

All CNVs present in the five homozygous strains were as- 
sessed for their allele frequencies. All copy-number in- 
creases and decreases were evaluated relative to PS1. 
Therefore, we did not know if a detected copy-number 
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increase or decrease represents a derived or ancestral allele. 
We make the parsimonious assumption that the minor allele 
(lower frequency) represents the derived state. For example, 
a focal probe showing higher copy number across all 
five "test" strains is most parsimoniously interpreted as 
a copy-number reduction in PS1. As shown in figure 16, 
37.6% of the copy-number increases as well as 42.2% of 
the decreases were unique to a single strain (singleton). 
The singletons discovered in only one strain are more likely 
to be true deletions and duplications relative to all other 
five strains including reference strain PS1, otherwise one 
would need to posit a shared CNV in other five strains. 
The nonsingleton 20.2% of CNVs were detected more than 
once among the five strains, among which 7.7% shared 
copy-number increase only relative to PS1, 7.3% shared 
copy-number decrease only, and 5.2% showed either in- 
crease or decrease in different strains. The fractions of sin- 
gleton and nonsingleton between copy-number increase 
and decrease are not significantly different (Fisher's exact 
test, P = 0.193). 



Dosage Effects of CNVs on Expression in Homo- 
zygotes 

The majority of the array probes used in this study are lo- 
cated in gene regions. In total, 11,934 genes are repre- 
sented on the array with an average of 1 .2 probes per 
gene (Hild et al. 2003; Dopman and Hartl 2007). Therefore, 
the same array platform can be used to compare gene copy 
number and expression variation. How gene expression lev- 
els and CNVs correlate with each other is essential to under- 
standing how structural changes induced by CNVs affect 
gene regulation. 

We began by investigating CNVs and their expression 
levels in homozygous PS2 and homozygous PS1. Shown 
in figure 2, if a gene in homozygous PS2 showed both 
an increase in copy number and expression level relative 
to that of the homozygous reference strain PS1, this focal 
gene is termed "dosage sensitive." A gene in PS2 that 
showed both a decrease in copy number and expression rel- 
ative to the reference is likewise termed dosage sensitive. 
Conversely, a gene showing an increase in copy number 
but a lower expression level than the reference is termed 
"dosage reversed." Genes whose expression levels do not 
change despite alterations in copy number are termed "dos- 
age insensitive." We observed that 21% of CNVs had 
matching expression variation, with 27 and 17 CNVs in 
PS2 showing dosage-sensitive and dosage-reversed expres- 
sion phenotypes, respectively. On the other hand, 1 63 CNVs 
(79%) showed no corresponding expression variation (dos- 
age insensitive) in the homozygous-homozygous compari- 
son between PS1 and PS2 (fig. 3A). The dosage effects for 
gene copy-number increases and decreases on gene expres- 
sion levels were similar, as shown in figure 36. Overall, 1 4.6% 
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Fig. 2. — Diagrams of analyses on associations of CNVs and gene 
expression. PS1 , PS2 and PS3 are three second-chromosome substitution 
strains. The horizontal solid lines represent CNV alleles in each strain. 
The left panel shows a "horizontal" comparison between PS1/PS1 and 
PS2/PS2 homozygotes as well as between PS1/PS3 and PS2/PS3 
heterozygotes, where PS2 contains CNVs relative to PS1 but PS3 allele 
is the same as PS1. The right panel shows a "vertical" comparison 
between PS1/PS1 and PS1/PS3, where PS3 allele harbors CNVs. 



and 1 1 .5% CNVs were dosage sensitive for copy-number in- 
creases and decreases, respectively; 9.7% and 6.7% CNVs 
were dosage reversed for increases and decreases, respec- 
tively; and 75.7% and 81.7% CNVs were dosage insensitive 
for increases and decreases, respectively. There was no signif- 
icant difference between CNV increase and decrease (Fisher 
exact test, P = 0.553). The largest fraction of CNVs fell into 
the dosage-insensitive categories, which is examined further 
in the Discussion. More importantly, the absolute expression 
levels for dosage-sensitive genes did not differ from that of 
dosage-insensitive genes. Two dosage-sensitive CNV genes 
are shown in figure 4, in which both gene Cyp6g1 and 
CG31636 had copy-number increases and higher expression 
levels in PS2 relative to PS1. A dosage-reversed gene 
CG15649 is also shown which had a higher copy number 
but lower expression level in PS2 compared with PS1. 

CNVs are Largely Recessive (Masked) in Hetero- 
zygotes 

Are changes in copy number resulting in expression changes 
in homozygous state recessive in heterozygotes? To address 
this issue, we considered CNVs in PS2 homozygotes with the 
expression phenotype manifested in the comparison be- 
tween homozygous PS1 versus PS2 and investigated if such 
expression differences were still present when heterozygous 
PS2/PS3 were contrasted with heterozygous PS1/PS3. For ex- 
ample, for a gene with both increased copy number and 
higher expression in homozygous PS2 relative to homozy- 
gous PS1 (fig. 2, dosage sensitive in homozygotes), the 
CNV is recessive if PS2/PS3 shows no expression difference 
from PS1/PS3 heterozygotes. In contrast, the CNV is nonre- 
cessive if a difference in expression observed in the homozy- 
gotes is maintained in the contrast between PS2/PS3 and 
PS1/PS3. One possible cause of the recessivity could be back- 
ground trans-factors from PS3. Nevertheless, the observation 
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Fig. 3. — Dosage effects of CNVs on expression in homozygotes. (A) In the left column, dosage sensitive indicates that CNVs have either copy- 
number increases with higher expression levels than reference strain PS1 or else that copy-number decreases with lower expression levels than PS1 . In 
the right column, dosage reversed suggests opposite negative associations. The middle panel shows the number of CNVs that are not sensitive to copy- 
number dosage effects. (B) The bars indicate the fractions of dosage-sensitive (gray), dosage-insensitive (black) and dosage-reversed CNVs (white) for 
copy-number increases and decreases separately. 



of no expression difference suggests the CNVs are masked in 
heterozygous background. 



Dosage-sensitive (Cyp6g1) 



Nonrecessive (Cyp6g1) 



uj 1 



PS1 



PS2 



B Dosage-sensitive (CG31 636) 



PS1/PS3 PS2/PS3 



Recessive (CG31636) 
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Fig. 4. — Examples illustrating dosage sensitivity of CNV genes. 
Expression indicates normalized estimates from BAGEL analysis. For all 
panels, copy number is higher in PS2 than in PS1. Diamonds represent 
genes, with credible intervals shown. The expression levels are normalized. 
(A) Dosage-sensitive gene Cyp6g1 shows a copy-number increase and 
higher expression level in PS2 relative to PS1. Its expression level is also 
higher in PS2/PS3 heterozygotes compared with PS1/PS3, suggesting 
nonrecessive phenotype of Cyp6g1 CNV gene. (6) Dosage-sensitive gene 
CG31636 shows a copy-number increase and higher expression level in 
PS2 relative to PS1. Its expression level is not different between PS2/PS3 
and PS1/PS3 heterozygotes, suggesting recessive phenotype of CG31636 
CNV gene. (Q Dosage-reversed gene CG15649 shows a copy-number 
increase but lower expression level in PS2 relative to PS1 . 



We observed 1 9 (70%) recessive CNVs and 8 (30%) non- 
recessive CNVs. Conversely, only 3% (5 of 163) of all dos- 
age-insensitive CNVs that did not show expression 
differences in the homozygous PS2 versus PS1 contrast ap- 
peared to show expression differences in the heterozygous 
PS2/PS3 versus PS1/PS3. There were 17 dosage-reversed 
CNVs in the homozygous PS2 versus PS1 comparison, 
76% of which were recessive in the heterozygous PS2/ 
PS3 versus PS1/PS3 contrast (fig. 5/4). The absolute expres- 
sion levels for recessive genes did not differ from that of 
nonrecessive genes (P = 0.1 0, Mann-Whitney test). In both 
of the dosage-sensitive and dosage-reversed groups, there 
were more recessive CNVs than nonrecessive CNVs. As to 
the dosage-insensitive CNVs, three CNVs harboring higher 
copy number showed no expression difference in homozy- 
gotes but higher levels in heterozygotes, and the other two 
CNVs harboring lower copy number showed lower levels in 
heterozygotes. The overall result suggests that expression 
differences caused by gene copy-number changes are 
largely masked in heterozygotes. Examples of nonrecessive 
and recessive CNV genes are shown in the right panel of 
figure 4. Gene Cyp6g1 appears to have a copy-number in- 
crease and higher expression level in PS2/PS3 relative to PS1/ 
PS3 heterozygotes. In contrast, gene CG3 1 636 has a higher 
copy number but its expression level in PS2/PS3 does not 
differ from PS1/PS3, therefore appears recessive. 

For the recessive and nonrecessive CNVs identified in ho- 
mozygous PS2 that were already categorized into dosage- 
sensitive, -insensitive, and -reversed groups, the CNVs 
were investigated for their allele frequencies (singleton or 
nonsingleton). Because both dosage-sensitive and dosage- 
reversed CNVs respond to copy-number changes, they were 
grouped together to compare with the overall allele fre- 
quency derived from all five homozygous strains. The group 
of dosage-insensitive CNVs corresponds to "recessive" in 
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Fig. 5. — Reconstruction of recessive and nonrecessive CNVs. (A) Three groups of CNVs (dosage sensitive, dosage insensitive and dosage reversed) 
from homozygous PS2 were investigated for their expression in heterozygous PS2/PS3 (for details, see Results). The numbers of recessive (black bars) 
and nonrecessive (gray bars) CNVs in dosage-sensitive, -insensitive and -reversed CNVs of homozygous PS2 are plotted, respectively. Percentages of 
recessive and nonrecessive CNVs are shown on the top. (6) Fractions of singleton and nonsingleton CNVs in the above groups of CNVs. Both dosage- 
sensitive and -reversed CNVs respond to copy-number change such that they are grouped together in comparison with the overall allele frequency 
derived from all five homozygous strains. Black bars indicate singleton CNVs. Gray bars indicate nonsingleton CNVs. Overall indicates singleton and 
nonsingleton data collected from five homozygous strains. Asterisks indicate P < 0.001 in the comparison between either dosage sensitive (recessive) or 
dosage insensitive and overall. 



yielding no expression difference between PS2/PS3 and PS1/ 
PS3. Shown in figure 56, the fraction of nonsingleton CNVs 
is significantly increased (Fisher exact test, P < 0.001) for 
both recessive dosage-sensitive CNVs and dosage-insensi- 
tive CNVs. However, we did not observe a significant differ- 
ence (Fisher exact test, P = 0.137) for the fraction of 
singletons and nonsingletons between nonrecessive dos- 
age-sensitive CNVs and overall CNVs. 

Dosage Effects of CNVs on Expression in Hetero- 
zygotes 

We also reconstructed heterozygous CNVs to directly corre- 
late heterozygous gene expression levels in PS2/PS3 and 
PS2/CS heterozygotes. In particular, for genes with variable 
copy number between homozygous PS2 and PS1 , we asked 
what happens to the expression of genes in the heterozy- 
gous state. This analysis differs from determining recessive 
CNVs classified as dosage sensitive, dosage insensitive, or 
dosage reversed in homozygous PS2. Gene expression 
may change in the heterozygous background. In this anal- 
ysis, horizontal comparisons (shown in fig. 2) of gene ex- 
pression levels between heterozygous PS1/PS3 and PS2/ 
PS3 (as well as PS1/CS and PS2/CS) were directly conducted 
and the CNVs classified in regard to dosage sensitivity in the 
heterozygous background. A total of 415 CNVs were 
pooled from PS2/PS3 and PS2/CS heterozygotes in contrast 
to PS1/PS3. Thirty-six CNVs showed dosage-sensitive 
effects, whereas 1 1 showed dosage-reversed effects. The 
remaining CNVs were dosage insensitive. Nearly 90% of 



the total CNVs fell into the dosage-insensitive group. The 
number of CNV increases and decreases is plotted sepa- 
rately for three groups in figure 6A, along with their corre- 
sponding fractions shown in figure 66. The fractions show 
no significant differences (chi-square test, P = 0.989). 

Protein Interactions for CNV Genes 

Dopman and Hartl (2007) reported that the occurrence of 
CNV is negatively correlated with the degree of protein in- 
teraction network. Natural selection plays critical roles in 
shaping CNV patterns, and dosage-sensitive CNVs might 
be expected to have greater functional consequences and 
fewer protein-protein interactions than dosage-insensitive 
ones. As shown in figure 7 A, the dosage-sensitive CNVs have 
a significantly lower number of protein-protein interactions 
than that of dosage-insensitive CNVs (dosage sensitive: one 
protein interactions [median]; dosage insensitive: two protein 
interactions [median]; P = 0.04, Mann-Whitney test). The 
same pattern holds true for another measure of centrality: 
betweenness (dosage sensitive, betweenness = 0 [median]; 
dosage insensitive, betweenness = 1 002 [median]; P = 0.02, 
Mann-Whitney test). 

Similarly, one would expect to see a higher number of 
interactions in recessive CNVs than that of nonrecessive 
CNVs. Although the trend showed the prediction, the dif- 
ference was not significant (recessive: two protein interac- 
tions [median]; nonrecessive: one protein interactions 
[median]; P = 0.22, Mann-Whitney test), possibly due to 
the relatively smaller sample size (fig. 76). The same pattern 
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Fig. 6. — Reconstruction of CNVs in heterozygotes and their effects 
on gene expression. (A) As shown in figure 2, heterozygous PS2/PS3 
expression can be compared with PS1/PS3 to infer CNVs' dosage effects 
on heterozygotes. The heterozygotes data were pooled from PS2/PS3 
and PS2/CS together (for details, see Results) and plotted. The graph 
shows the number of three groups of CNVs that are dosage sensitive, 
dosage insensitive and dosage reversed. Het Ds: heterozygous dosage 
sensitive; Het Di: heterozygous dosage insensitive; Het Dr: heterozygous 
dosage reversed. (6) The fractions of dosage sensitive (gray), dosage 
insensitive (black) and dosage reversed (white) for CNV increases and 
decreases in heterozygotes are plotted separately. 

holds true for another measure of centrality: betweenness 
(recessive, betweenness = 311 [median]; nonrecessive, be- 
tweenness = 0 [median]; P = 0.34, Mann-Whitney test]. 

Upregulation and Downregulation in CNVs 

To infer if upregulated or downregulated CNVs match with 
their copy-number changes, we employed a slightly differ- 
ent analysis. As illustrated in figure 2, PS1/PS1 and PS1/PS3 
(or PS1/CS) were vertically compared in which PS3 (or CS) 
contained CNVs. A number of CNVs involving upregulation 
and downregulation were detected. These CNVs were then 
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Fig. 7. — The degree of protein interactions for CNV genes. (A) The 
number of protein interactions for dosage-sensitive and dosage- 
insensitive CNV genes is plotted. Dosage-sensitive genes have signifi- 
cantly more protein interactions. (6) The number of protein interactions 
for recessive and nonrecessive CNV genes is plotted. Recessive genes 
appear to have more protein interactions than nonrecessive genes. 
However, the difference is not significant. Bold horizontal bars are the 
median value, the box is the interquartile range, and the whiskers 
indicate the 95% confidence interval. 
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Fig. 8. — Upregulated and downregulated CNVs are matched with 
copy-number increase and decrease. As shown in figure 2, heterozy- 
gous PS1/PS3 expression can be compared with homozygous PS1 to 
infer if CNVs are upregulated or downregulated in heterozygotes (here, 
PS3 contains CNVs). The black bars indicate the number of CNVs in 
which PS3 (as well as CS, for details, see Results) CNV allele is 
upregulated relative to PS1 in expression. The gray bars indicate 
downregulation of PS3 (or CS) allele relative to PS1. The left two 
columns show the CNVs with copy-number increases and the right two 
columns show the CNVs with copy-number decreases. 

sorted based upon copy-number increases or decreases. 
Shown in figure 8, 17 CNVs containing copy-number 
increases in PS3 and CS were upregulated (higher copy 
number and higher expression) relative to PS1 in heterozy- 
gous PS1/PS3 or PS1/CS, whereas only two CNVs with 
increases were downregulated (lower expression). In con- 
trast, more downregulation events were discovered in CNVs 
with copy-number decreases. Twelve downregulations 
(lower copy number and lower expression) were found 
for PS3 and CS, whereas only four upregulated CNVs (higher 
expression) were found for PS3 and CS. It appears that up- 
and downregulated CNVs are positively associated with 
gene copy-number changes. 

Discussion 

CNV Pattern in Homozygous Genotypes 

Studies of copy-number variation in natural populations of 
D. melanogaster had previously been conducted with het- 
erozygous isofemale strains (Dopman and Hartl 2007; 
Emerson et al. 2008). In these cases, many low-frequency 
CNVs are heterozygous and may remain undetected. Here, 
we established a number of second-chromosome substitu- 
tion strains derived from a single Pennsylvania population 
(except strain CS) to evaluate CNV occurrence in completely 
homozygous genotypes. The results indicate extensive copy- 
number variation on the second chromosome of these flies. 
Indeed, within the context of our own experimental design, 
the fraction of protein-coding genes harboring CNVs that 
can be detected in homozygotes is higher than that of het- 
erozygotes. Furthermore, the faction of CNVs detected in 
homozygous genotypes is also higher than that reported 
in previous studies with heterozygous genotypes. These 
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findings suggest that the aCGH microarray analysis under- 
estimates CNVs in heterozygous genotypes because of a di- 
minished power of detecting CNVs. 

Also, previous studies found more duplications than de- 
letions in fruit flies (Emerson et al. 2008). One possibility is 
that duplicating a region may confer milder phenotypes 
than deleting it, such that purifying selection may be stron- 
ger against deletions in CNV genes. Also, deletions in het- 
erozygous state may potentially reduce the detection power 
of CNVs in aCGH arrays. In another study examining mam- 
malian genomes, the results suggested a strong bias against 
duplications for genes whose protein products belong to 
complexes, with less than a quarter of the CNVs scored 
as gains (Schuster-Bockler et al. 2010). In contrast, our 
results found that the frequencies of copy-number increase 
and decrease are exceptionally close in these substitution 
lines, suggesting a highly variable CNV composition across 
species and within species. 

CNV Dosage Sensitivity and Effects on Expression 

CNVs can have drastic phenotypic consequences as a result 
of altering gene dosage, disrupting coding sequences, or per- 
turbing gene regulation. The degree of penetrance (the frac- 
tion of a genotype that shows the associated phenotype) of 
CNV encompassed genes is essential to understanding the 
impact of CNVs on expression and potentially their associa- 
tion with genetic disorders (Beckmann et al. 2007). We 
found that 13% of homozygous CNVs were dosage sensi- 
tive, meaning that gene expression levels positively associate 
with copy-number increase or decrease. Conversely, we dis- 
covered that 8% of CNVs were dosage reversed which ex- 
hibited negative associations between expression and copy 
number in homozygotes. The two categories were 9% 
and 3%, respectively, for CNVs in heterozygous states. 

Dosage-reversed CNVs were also discovered in human 
genomes. In the case of copy-number duplications, 10% 
of the CNVs in human genome were found to be dosage 
reversed (Stranger et al. 2007; Beckmann et al. 2007). 
Schuster-Bockler et al. (2010) also reported a complex rela- 
tionship between copy number and expression level in hu- 
man heterozygous CNVs. For example, more than 10% of 
the CNVs exhibited dosage-reversed expression pattern in 
their study. In addition to genes exhibiting changes in both 
copy number and expression, the remaining 79% of the 
CNVs in homozygotes or 89% of the CNVs in heterozygotes 
in our study were not responsive to gene copy-number 
changes (dosage insensitive). Similarly, around 65% of CNVs 
were dosage insensitive in the studies conducted by Schus- 
ter-Bockler et al. (201 0). All the above findings strongly sug- 
gest an extremely complex relationship between gene copy 
number and expression. 

Young duplicated genes typically exhibit increased expres- 
sion divergence (Farre and Alba 201 0). Under certain condi- 



tions, gene duplications may induce reduced transcripts or 
even gene silencing. On the contrary, deletion of a transcrip- 
tional repressor could serve to elevate gene expression 
(Stranger et al. 2007). Both factors could contribute to the 
discovery of CNVs whose expression phenotype is dosage re- 
versed. On the other hand, dosage-insensitive CNVs could 
arise if gene promoter regions were not duplicated or deleted 
along with the CNV regions. Also partial duplication or de- 
letion of genes may not significantly affect gene expression 
levels. Nevertheless, the presence of detectable gene expres- 
sion implies that at least one copy of the gene is present. 
Therefore, a deletion occurred in one of the other copy or 
copies did not significantly change the expression. 

CNVs can alter gene doses without abolishing gene func- 
tion or changing phenotype. As shown in the results, the 
majority of CNVs were found to be dosage insensitive, par- 
ticularly in heterozygous CNVs. Therefore, CNVs appear to 
be less likely to contain dosage-sensitive genes, indicating 
that negative selection acts on the shaping of CNVs. Previ- 
ously, CNV genes encoding protein complexes were found 
to be significantly underrepresented (Dopman and Hartl 
2007; Schuster-Bockler et al. 2010). Hence, selection facil- 
itates the formation and spread of CNV patterns due to 
functional constraints. 

The observations between low or no change in gene ex- 
pression and change of gene copy number suggest that cells 
may attempt to compensate changes in gene copy number 
on expression by modifying transcription. Dosage compen- 
sation has been widely addressed in plants, worms, mam- 
mals, and fruit flies (Charlesworth 1996; Birchler et al. 
2005, 2007; Vicoso and Bachtrog 2009; Prestel et al. 
2010). The molecular mechanism of dosage compensation 
involves chromatin structure remodeling (Bachtrog et al. 
2010; Prestel et al. 2010). Transcription factors, chromatin 
proteins, and signal-transduction genes were found to be 
predominantly responsible for dosage effects (Birchler 
et al. 2001, 2005). However, the mechanisms by which 
CNVs affect dosage compensation are not well understood. 
CNVs dosage effects on gene expression may be dependent 
on local chromatin modifications or regulatory genes in the 
dosage compensation cascades. Note that some CNVs 
change dosage status from homozygotes to heterozygotes 
(e.g., dosage-sensitive CNVs in homozygotes become insen- 
sitive or vice versa), again suggesting a complex relationship 
between gene dosage and expression. 

CNVs are Largely Recessive in Heterozygous State 

Previous studies reported that 70% of differentially ex- 
pressed genes in homozygotes were masked in heterozy- 
gous state (Lemos et al. 2008). CNVs encompassed genes 
appeared to show similar patterns in our studies. More than 
70% of the CNVs that were sensitive to copy-number 
changes in contrasts between homozygous individuals 
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appeared to be recessive when in the heterozygote (fig. 5A). 
This finding suggests a buffered response to structural 
changes induced by CNVs and implies that heterozygous 
masking effect may protect genes from harmful consequen- 
ces. In the case of gene duplications, apparent masking in 
heterozygotes may reflect the reduced power to detect tran- 
script abundances of 3:2 in heterozygotes versus 4:2 in ho- 
mozygotes. Silencing by unpaired DNA might be another 
mechanism operating in heterozygous CNVs (Shiu and Met- 
zenberg 2002). The unpaired copy of a gene might reduce 
the expression of other homozygous copies in the genome. 

Consistent with a previous report (Dopman and Hartl 
2007), we found that only 20% of the CNVs were nonsin- 
gletons in the population. Interestingly, the fraction of non- 
singletons for recessive dosage-sensitive CNVs is increased 
significantly (47%) compared with that of overall CNVs 
(fig. 56). The increase suggests that selection typically pre- 
vents the spread of CNVs in natural populations, however, 
with a higher tolerance if the CNVs are recessive in their ef- 
fects on expression. Consistent with this hypothesis, the 
fractions of singletons and nonsingletons for nonrecessive 
dosage-sensitive CNVs did not differ from that of overall 
CNVs (fig. 56). Another category in which the fraction of 
nonsingleton increased significantly consists of dosage- 
insensitive CNVs, most likely due to the low penetrance 
of CNVs having little or no effects on phenotypes. 

Selection May Constrain Protein Interactions for 
CNV Genes 

It is known that protein-coding changes may impair the abil- 
ity of a protein to form dependable network interactions 
(Fraser et al. 2002). CNVs were reported to negatively cor- 
relate with the degree of protein interaction network 
(Dopman and Hartl 2007), indicating selection is likely to 
shape the CNV distribution. Here, we also found that dos- 
age-sensitive CNVs have fewer protein-protein interactions 
than dosage-insensitive CNVs (fig. 7A). Dosage-insensitive 
genes are less stringent to structural changes such that their 
mutational influences in the protein network are kept min- 
imal. In contrast, stronger selection on central nodes may 
result in dosage-sensitive genes showing a lessened number 
of protein-protein interactions and betweenness. Similarly, 
recessive CNV genes were expected to have more protein 
interactions than nonrecessive ones. However, possibly 
due to a relatively small sample size, they did not show sta- 
tistically significant difference in our study although the 
trend appeared consistent with the expectation (fig. 76). 

Up- and Downregulated CNVs Coincide with Copy- 
Number Increase and Decrease 

CNVs that are upregulated or downregulated in expression 
were found positively associated with their copy-number 
changes (fig. 8). The fraction of upregulated or downregu- 
lated CNVs is higher than that of dosage-sensitive CNVs in 



heterozygotes discussed above (—27% vs. —10%). Some 
trans-effects (or background effects from PS3) may be in- 
volved in determining dosage effects in heterozygotes. In 
the case of up- and downregulated CNVs (fig. 8), one 
may expect that the fraction of singletons would increase 
relative to that of overall CNVs because selection is presum- 
ably against gene copy-number changes. However, the frac- 
tion of singletons and nonsingletons did not differ from that 
of overall CNVs (data not shown). 

Conclusions 

This study has revealed several important features of CNVs 
in a number of second-chromosome substitution lines from 
a single natural population of D. melanogaster, particularly 
with respect to the balance between CNV encompassed 
gene dosage and expression. The fraction of CNVs among 
homozygotes appeared to be higher than in heterozygotes, 
indicating underestimation by aCGH arrays. We found many 
cases of CNV genes that are sensitive to copy-number 
changes. However, the majority of genes show no signifi- 
cant change in expression with copy number. More than 
70% of the CNVs are recessive in expression in heterozy- 
gotes. Selection appears to prevent CNVs from spreading 
in the population as indicated by allele frequencies, recessive 
CNVs, and protein interaction data. With this CNV and ex- 
pression association study in D. melanogaster, we have 
achieved an understanding of CNV dosage effect to some 
extent. Many questions still remain unsolved such as CNV 
distributions on other chromosomes besides the second, 
what mechanisms cells employ to modulate CNV dosage 
and reach a balance. Despite the critical role of CNVs in shap- 
ing genotypes and phenotypes, the majority of the identified 
CNVs have not yet been finely resolved to the nucleotide 
level. Large CNV genotype data sets from different popula- 
tions are required to extensively study the roles of CNV in 
genome evolution. Next-generation sequencing or a combi- 
nation of aCGH array and sequencing tools will enable us to 
dissect this relationship further to greater resolution. 

Supplementary Material 

Supplementary figures S1-S3 and table S1 are available at 
Genome Biology and Evolution online (http://www. 
oxfordjournals.org/our_journals/gbe). 

CNV raw data reported in this paper have been deposited 
in the Gene Expression Omnibus (GEO) database, 
www.ncbi.nlm.nih.gov/geo (accession no. GSE27632). 
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